rr library(fable) library(sweep)
Registered S3 method overwritten by 'xts':
method from
as.zoo.xts zoo
The fable package is a tidy renovation of the forecast package, and it explores new interfaces for modelling and subsequent analysis in R. For users experienced with the tidyverse, modelling in R can be a jarring experience. Models in R can be difficult to work with as there is little standardisation in model object structures and interfaces. This is partially alleviated from tidy modelling packages such as broom (and sweep for time series) which can be used to extract key features from model objects in suitable formats for tidy workflows.
the fable package applies tidyverse principles to time series modelling, making the forecasting workflow seamlessly integrate with other tidyverse packages.
rr library(fable) library(tsibble) library(tsibbledata) library(lubridate) library(dplyr)
The process of producing forecasts for time series data can be broken down into a few steps.
Data Prep
The first step in forecasting is to prepare data in the correct format. This process may involve loading in data, identifying missing values, filtering the time series and other pre-processing tasks. The functionality provided by tsibble and other packages in the tidyverse substantially simplifies this step.
Many models have different data requirements, some require the series to be in time order, others require no missing values. Checking your data is an essential step to understanding its features and it is useful to do before models are estimated.
The way in which your data is prepared can also be used to explore different features of the time series. As we will see later, pre-processing your dataset is an important step in evaluating model performance using cross-validation.
rr global_economy
Visualise
rr global_economy %>% filter(Country==) %>% autoplot(GDP) + ggtitle(for Sweden) + ylab($US billions)

Define a model
Before fitting a model to the data, we first must describe the model. There are many different time series models that can be used for forecasting, and much of this book is dedicated to describing various models. Specifying an appropriate model for the data is essential for producing appropriate forecasts.
Models in R are specified using model functions, which each use a formula (y ~ x) interface. The response variable(s) are specified on the left of the formula, and the structure of the model is written on the right.
For example, a time series linear model that models GDP can be specified using:
TSLM(GDP ~ trend()).
In this case the model function is TSLM() (time series linear model), the response variable is GDP and it is being modelled using trend() (a “special” function specifying a linear trend).
Train the model
Once an appropriate model is specified, we next train the model on some data. One or more model specifications can be estimated using the model() function.
To estimate the model in our example, we use:
rr fit <- global_economy %>% model(trend_model = TSLM(GDP ~ trend()))
7 errors (1 unique) encountered for trend_model
[7] 0 (non-NA) cases
This fits a linear trend model to the GDP data for each combination of key variables in the tsibble. In this example, it will fit a model to each of the 263 countries in the dataset. The resulting object is a model table or a “mable”.
rr fit
Each row corresponds to one combination of the key variables. The trend_model column contains information about the fitted model for each country.
Check/evaluate your model
Once a model has been fitted, it is important to check how well it has performed on the data. There are several diagnostic tools available to check model behaviour, and also accuracy measures that allow one model to be compared against another.
Produce forecasts
With an appropriate model specified, estimated and checked, it is time to produce the forecasts using forecast(). The easiest way to use this function is by specifying the number of future observations to forecast. For example, forecasts for the next 10 observations can be generated using h = 10. We can also use natural language; e.g., h = “2 years” can be used to predict two years into the future.
rr fit %>% forecast(h = years)
Could not bias adjust the point forecasts as the forecast standard deviation is unknown. Perhaps your series is too short or insufficient bootstrap samples are used.
This is a forecasting table, or “fable”. Each row corresponds to one forecast period for each country. The GDP column contains the point forecast, while the .distribution column contains the forecasting distribution. The point forecast is the mean (or average) of the forecasting distribution.
The forecasts can be plotted along with the historical data using autoplot() as follows.
rr fit %>% forecast(h = years) %>% filter(Country==) %>% autoplot(global_economy) + ggtitle(for Sweden) + ylab($US billions)
Could not bias adjust the point forecasts as the forecast standard deviation is unknown. Perhaps your series is too short or insufficient bootstrap samples are used.

Forecasting methods
- Average method
Here, the forecasts of all future values are equal to the average (or “mean”) of the historical data.
rr global_economy %>% model(MEAN(GDP)) %>% autoplot()
7 errors (1 unique) encountered for MEAN(GDP)
[7] All observations are missing, a model cannot be estimated without data.
Error: Objects of type mdl_df/tbl_df/tbl/data.frame not supported by autoplot.
- Naive method
For naïve forecasts, we simply set all forecasts to be the value of the last observation. These are also known as random walk forecasts, as this method is optimal when data follows a random walk model. This method works remarkably well for many economic and financial time series.
rr global_economy
Error in global_economy(.) : could not find function \global_economy\
- Seasonal naive method
A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year).
This looks more complicated than it really is. For example, with monthly data, the forecast for all future February values is equal to the last observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and quarters, and for other seasonal periods.
rr global_economy %>% global_economy %>% model(SNAIVE(GDP ~ lag()))
Error in global_economy(.) : could not find function \global_economy\
rr model(SNAIVE(GDP ~ lag()))
Error in UseMethod(\model\) :
no applicable method for 'model' applied to an object of class \c('mdl_defn'
- Drift method
A variation on the naïve method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in the historical data. This is equivalent to drawing a line between the first and last observations, and extrapolating it into the future.
rr global_economy %>% model(RW(GDP ~ drift()))
|=========== | 38% ~31 s remaining
|============= | 42% ~27 s remaining
|============== | 46% ~23 s remaining
|============== | 47% ~22 s remaining
|================ | 52% ~18 s remaining
|================= | 57% ~15 s remaining
|=================== | 62% ~12 s remaining
|===================== | 68% ~9 s remaining
|====================== | 73% ~7 s remaining
|======================= | 77% ~6 s remaining
|======================== | 80% ~5 s remaining
|========================== | 84% ~4 s remaining
|========================== | 86% ~3 s remaining
|============================ | 90% ~2 s remaining
|============================ | 93% ~1 s remaining
|============================== | 99% ~0 s remaining
7 errors (1 unique) encountered for RW(GDP ~ drift())
[7] All observations are missing, a model cannot be estimated without data.
cannot open compressed file '/Users/stephanieboyle/.rstudio-desktop/notebooks/F8199C2F-time_series_forecasting/1/A01BA05017392E95/cwri0ao840vm5_t/_rs_rdf_372a75dfc3df.rdf', probable reason 'No such file or directory'Error in gzfile(file, \wb\) : cannot open the connection
Now it’s your turn:
rr # Set training data from 1992 to 2006 train <- aus_production %>% filter_index(992 Q1 ~ 006 Q4) # Fit the models beer_fit <- train %>% model( Mean = MEAN(Beer), Naïve = NAIVE(Beer), Seasonal naïve = SNAIVE(Beer) ) # Generate forecasts for 14 quarters beer_fc <- beer_fit %>% forecast(h=14) # Plot forecasts against actual values beer_fc %>% autoplot(train, level = NULL) + autolayer(filter_index(aus_production, 007 Q1 ~ .), color = ) + ggtitle(for quarterly beer production) + xlab() + ylab() + guides(colour=guide_legend(title=))
In this case, only the seasonal naïve forecasts are close to the observed values from 2007 onwards.
rr
aus_retail1 <- aus_retail %>% filter(State %in% c(South Wales, ), Industry == stores)
aus_retail1 %>% model( ets = ETS(box_cox(Turnover, 0.3)), arima = ARIMA(log(Turnover)), snaive = SNAIVE(Turnover) ) %>% forecast(h = years) %>% autoplot(filter(aus_retail, year(Month) > 2010), level = NULL)
rr bricks <- aus_production %>% filter_index(1970 ~ 2004) bricks %>% model(MEAN(Bricks)) bricks %>% model(NAIVE(Bricks)) bricks %>% model(SNAIVE(Bricks ~ lag())) bricks %>% model(RW(Bricks ~ drift()))
the non-seasonal methods are applied to Google’s daily closing stock price in 2015, and used to forecast one month ahead. Because stock prices are not observed every day, we first set up a new time index based on the trading days rather than calendar days.
rr # Re-index based on trading days google_stock <- gafa_stock %>% filter(Symbol == ) %>% mutate(day = row_number()) %>% update_tsibble(index = day, regular = TRUE) # Filter the year of interest google_2015 <- google_stock %>% filter(year(Date) == 2015) # Fit the models google_fit <- google_2015 %>% model( Mean = MEAN(Close), Naïve = NAIVE(Close), Drift = NAIVE(Close ~ drift()) ) # Produce forecasts for the 19 trading days in January 2015 google_fc <- google_fit %>% forecast(h = 19) # A better way using a tsibble to determine the forecast horizons google_jan_2016 <- google_stock %>% filter(yearmonth(Date) == yearmonth(016 Jan)) google_fc <- google_fit %>% forecast(google_jan_2016) # Plot the forecasts google_fc %>% autoplot(google_2015, level = NULL) + autolayer(google_jan_2016, Close, color=‘black’) + ggtitle(stock (daily ending 31 Dec 2015)) + xlab() + ylab(Price (US$)) + guides(colour=guide_legend(title=))
Sometimes one of these simple methods will be the best forecasting method available; but in many cases, these methods will serve as benchmarks rather than the method of choice. That is, any forecasting methods we develop will be compared to these simple methods to ensure that the new method is better than these simple alternatives. If not, the new method is not worth considering.
ACF and PACF plots
We can also see how seasonal our data is by looking at ACF and PACF plots.
# acf plots
holidays %>%
ACF(Trips) %>%
autoplot()
holidays %>%
PACF(Trips,4) %>%
autoplot()
Here, the low seasonality in the ACT is evident compared to the other states.
Finally, we show a composite plot created using gg_tsdisplay(). This is a little different from the corresponding ggtsdisplay() function in the forecast package which showed the PACF in the bottom right panel by default. I think the season plot is a little more informative for exploratory data analysis, so that is what is shown by default in this new function. The other panels are the same.
holidays %>%
filter(State=="Tasmania") %>%
gg_tsdisplay(Trips)
---
title: "R Notebook"
output: html_notebook
---

```{r}

library(fable)
library(sweep)

```


The fable package is a tidy renovation of the forecast package, and it explores new interfaces for modelling and subsequent analysis in R. For users experienced with the tidyverse, modelling in R can be a jarring experience. Models in R can be difficult to work with as there is little standardisation in model object structures and interfaces. This is partially alleviated from tidy modelling packages such as broom (and sweep for time series) which can be used to extract key features from model objects in suitable formats for tidy workflows.

the fable package applies tidyverse principles to time series modelling, making the forecasting workflow seamlessly integrate with other tidyverse packages.

```{r}
library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
```




The process of producing forecasts for time series data can be broken down into a few steps.

<center>
![](https://otexts.com/fpp3/fpp_files/figure-html/workflow-1.png)
</center>


# Data Prep

The first step in forecasting is to prepare data in the correct format. This process may involve loading in data, identifying missing values, filtering the time series and other pre-processing tasks. The functionality provided by tsibble and other packages in the tidyverse substantially simplifies this step.

Many models have different data requirements, some require the series to be in time order, others require no missing values. Checking your data is an essential step to understanding its features and it is useful to do before models are estimated.

The way in which your data is prepared can also be used to explore different features of the time series. As we will see later, pre-processing your dataset is an important step in evaluating model performance using cross-validation.

```{r}
global_economy
```


# Visualise 

```{r}
global_economy %>%
  filter(Country=="Sweden") %>%
  autoplot(GDP) +
    ggtitle("GDP for Sweden") + ylab("$US billions")
```


# Define a model 

Before fitting a model to the data, we first must describe the model. There are many different time series models that can be used for forecasting, and much of this book is dedicated to describing various models. Specifying an appropriate model for the data is essential for producing appropriate forecasts.

Models in R are specified using model functions, which each use a formula (y ~ x) interface. The response variable(s) are specified on the left of the formula, and the structure of the model is written on the right.

For example, a time series linear model that models GDP can be specified using:

```{r, eval = FALSE}
TSLM(GDP ~ trend()).
```

In this case the model function is TSLM() (time series linear model), the response variable is GDP and it is being modelled using trend() (a “special” function specifying a linear trend).


# Train the model 

Once an appropriate model is specified, we next train the model on some data. One or more model specifications can be estimated using the model() function.

To estimate the model in our example, we use:

```{r}
fit <- global_economy %>%
  model(trend_model = TSLM(GDP ~ trend()))
```

This fits a linear trend model to the GDP data for each combination of key variables in the tsibble. In this example, it will fit a model to each of the 263 countries in the dataset. The resulting object is a model table or a “mable”.

```{r}
fit
```

Each row corresponds to one combination of the key variables. The trend_model column contains information about the fitted model for each country. 

# Check/evaluate your model 

Once a model has been fitted, it is important to check how well it has performed on the data. There are several diagnostic tools available to check model behaviour, and also accuracy measures that allow one model to be compared against another.

# Produce forecasts


With an appropriate model specified, estimated and checked, it is time to produce the forecasts using forecast(). The easiest way to use this function is by specifying the number of future observations to forecast. For example, forecasts for the next 10 observations can be generated using h = 10. We can also use natural language; e.g., h = "2 years" can be used to predict two years into the future.

```{r}
fit %>% forecast(h = "2 years")
```


This is a forecasting table, or “fable”. Each row corresponds to one forecast period for each country. The GDP column contains the point forecast, while the .distribution column contains the forecasting distribution. The point forecast is the mean (or average) of the forecasting distribution.

The forecasts can be plotted along with the historical data using autoplot() as follows.

```{r}
fit %>% forecast(h = "2 years") %>%
  filter(Country=="Sweden") %>%
  autoplot(global_economy) +
    ggtitle("GDP for Sweden") + ylab("$US billions")
```


## Forecasting methods

1. Average method

Here, the forecasts of all future values are equal to the average (or “mean”) of the historical data. 

```{r}
global_economy %>%
  model(MEAN(GDP))

```

2. Naive method

For naïve forecasts, we simply set all forecasts to be the value of the last observation. These are also known as random walk forecasts, as this method is optimal when data follows a random walk model. This method works remarkably well for many economic and financial time series.

```{r}
global_economy %>%
  model(NAIVE(GDP))
```

3. Seasonal naive method

A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year).

This looks more complicated than it really is. For example, with monthly data, the forecast for all future February values is equal to the last observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and quarters, and for other seasonal periods.

```{r}
global_economy %>%
  model(SNAIVE(GDP ~ lag("year")))
```


4. Drift method

A variation on the naïve method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the drift) is set to be the average change seen in the historical data. This is equivalent to drawing a line between the first and last observations, and extrapolating it into the future.


```{r}
global_economy %>% model(RW(GDP ~ drift()))
```

Now it's your turn:

```{r}
# Set training data from 1992 to 2006
train <- aus_production %>% filter_index("1992 Q1" ~ "2006 Q4")
# Fit the models
beer_fit <- train %>%
  model(
    Mean = MEAN(Beer),
    `Naïve` = NAIVE(Beer),
    `Seasonal naïve` = SNAIVE(Beer)
  )
# Generate forecasts for 14 quarters
beer_fc <- beer_fit %>% forecast(h=14)
# Plot forecasts against actual values
beer_fc %>%
  autoplot(train, level = NULL) +
    autolayer(filter_index(aus_production, "2007 Q1" ~ .), color = "black") +
    ggtitle("Forecasts for quarterly beer production") +
    xlab("Year") + ylab("Megalitres") +
    guides(colour=guide_legend(title="Forecast"))

# In this case, only the seasonal naïve forecasts are close to the observed values from 2007 onwards.
```


```{r}


aus_retail1 <- aus_retail %>%
  filter(State %in% c("New South Wales", "Victoria"),
    Industry == "Department stores")

aus_retail1 %>%
  model(
    ets = ETS(box_cox(Turnover, 0.3)),
    arima = ARIMA(log(Turnover)),
    snaive = SNAIVE(Turnover)
  ) %>%
  forecast(h = "2 years") %>% 
  autoplot(filter(aus_retail, year(Month) > 2010), level = NULL)
```

```{r}
bricks <- aus_production %>% filter_index(1970 ~ 2004)
bricks %>% model(MEAN(Bricks))
bricks %>% model(NAIVE(Bricks))
bricks %>% model(SNAIVE(Bricks ~ lag("year")))
bricks %>% model(RW(Bricks ~ drift()))
```


the non-seasonal methods are applied to Google’s daily closing stock price in 2015, and used to forecast one month ahead. Because stock prices are not observed every day, we first set up a new time index based on the trading days rather than calendar days.

```{r}
# Re-index based on trading days
google_stock <- gafa_stock %>%
  filter(Symbol == "GOOG") %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)
# Filter the year of interest
google_2015 <- google_stock %>% filter(year(Date) == 2015)
# Fit the models
google_fit <- google_2015 %>%
  model(
    Mean = MEAN(Close),
    `Naïve` = NAIVE(Close),
    Drift = NAIVE(Close ~ drift())
  )
# Produce forecasts for the 19 trading days in January 2015
google_fc <- google_fit %>% forecast(h = 19)
# A better way using a tsibble to determine the forecast horizons
google_jan_2016 <- google_stock %>%
  filter(yearmonth(Date) == yearmonth("2016 Jan"))
google_fc <- google_fit %>% forecast(google_jan_2016)
# Plot the forecasts
google_fc %>%
  autoplot(google_2015, level = NULL) +
    autolayer(google_jan_2016, Close, color='black') +
    ggtitle("Google stock (daily ending 31 Dec 2015)") +
    xlab("Day") + ylab("Closing Price (US$)") +
    guides(colour=guide_legend(title="Forecast"))
```


Sometimes one of these simple methods will be the best forecasting method available; but in many cases, these methods will serve as benchmarks rather than the method of choice. That is, any forecasting methods we develop will be compared to these simple methods to ensure that the new method is better than these simple alternatives. If not, the new method is not worth considering.





### ACF and PACF plots

We can also see how seasonal our data is by looking at ACF and PACF plots. 

```{r, eval = FALSE}
# acf plots 
holidays %>% 
  ACF(Trips) %>%
  autoplot()


holidays %>% 
  PACF(Trips,4) %>%
  autoplot()

```
Here, the low seasonality in the ACT is evident compared to the other states.


Finally, we show a composite plot created using `gg_tsdisplay()`. This is a little different from the corresponding `ggtsdisplay()` function in the forecast package which showed the PACF in the bottom right panel by default. I think the season plot is a little more informative for exploratory data analysis, so that is what is shown by default in this new function. The other panels are the same.

```{r}
holidays %>% 
  filter(State=="Tasmania") %>% 
  gg_tsdisplay(Trips)
```

